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Mutual learning of a pair of tree parity machines with continuous and discrete weight vectors is 
studied analytically. The analysis is based on a mapping procedure that maps the mutual learning 
in tree parity machines onto mutual learning in noisy perceptrons. The stationary solution of the 
mutual learning in the case of continuous tree parity machines depends on the learning rate where 
a phase transition from partial to full synchronization is observed. In the discrete case the learning 
process is based on a finite increment and a full synchronized state is achieved in a finite number 
of steps. The synchronization of discrete parity machines is introduced in order to construct an 
ephemeral key-exchange protocol. The dynamic learning of a third tree parity machine (an attacker) 
that tries to imitate one of the two machines while the two still update their weight vectors is also 
analyzed. In particular, the synchronization times of the naive attacker and the flipping attacker 
recently introduced in |g] are analyzed. All analytical results are found to be in good agreement 
with simulation results. 
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I. INTRODUCTION 

Artificial neural networks are known for their ability to 
learn |Q, |[. They produce an output from a given input 
according to some weight vector and a transfer function. 
Traditionally, there are two types of learning. One type 
is unsupervised learning where a network receives input 
and tries to learn about the input distribution. The other 
type is the teacher-student scenario, when the so-called 
teacher receives inputs, produces outputs and gives an- 
other machine, the so-called student, both the inputs and 
their assigned outputs. In such a scenario the teacher is 
static, i.e., its weight vector does not change during the 
learning, and the student tries to imitate the teacher so 
as to produce the same output in a new unknown exam- 
ple by dynamically updating its weight vector. The state 
in which the student achieves the same weight vector as 
that of the teacher and can therefore perform the same 
output as that of the teacher is referred to as perfect 
learning. 

During the last few years a new type of learning sce- 
nario has been introduced and is under discussion: the 
mutual learning procedure. In the mutual learning pro- 
cedure there is no distinction between the teacher role 
and the student role; both networks function the same 
way. They receive inputs, calculate the outputs and up- 
date their weight vector according to the match between 
their mutual outputs J|, Q|. This is an online learning 
procedure where in each step one input vector is given, 
the output in both machines is calculated and the re- 
sulting increment of each weight vector is added accord- 
ingly. It was found that perceptrons that undergo mutual 
learning might end up in a synchronized state when the 
weight vectors of both machines are either parallel - ex- 
actly the same, or anti-parallel - exactly the opposite (de- 
pending on their specific updating rule). The stationary 
synchronized solution is equivalent to the stationary per- 
fect learning solution in the teacher-student scenario. We 



extend the analysis of mutual learning between percep- 
trons to mutual learning between parity machines . We 
introduce a generic method of analyzing mutual learning 
in feedforward tree multi-layer networks where we con- 
centrate on the tree parity machine (TPM)[||, [| Q|. The 
method is based on a mapping procedure that maps the 
mutual learning in TPMs onto mutual learning in noisy 
perceptrons. 

A novel cryptosystem composed of two parity machines 
that synchronize has recently attracted much attention 
J|, |[ |l(], [ll]]. A host of simulation results show that 
discrete TPMs can synchronize very fast and a third ma- 
chine that tries to learn their weight vector achieves only 
partial success. These properties make mutual learning 
in TPMs attractive for applications in secure communi- 
cations, as an information-bearing message can be hid- 
den within a complicated structure of the TPM's weight 
vectors and still be reconstructed at the receiver using 
another TPM whose parameters are exactly matched to 
those of the first one. This type of cryptosystem can 
provide a new basis for security much different from cur- 
rently used cryptosystems that involve large integers and 
are based upon number theory |ft2|| . 

The discrete machines studied carried out an updat- 
ing procedure different from the conventional learning 
procedures analyzed in neural networks. In the discrete 
machine procedure the increment of the weight vector in 
each step is finite and not infinitesimally small. Since the 
methods of analyzing discrete on-line learning in contem- 
porary research, see |l4| , |iH| , |l(| [It}, are not applicable 
to this case, we introduce here a novel method for ana- 
lyzing mutual learning in networks with discrete weight 
vectors and a learning process that is based on a finite in- 
crement. First, we describe mutual learning with discrete 
perceptrons, and then we exploit the method of mapping 
mutual learning between TPMs onto mutual learning be- 
tween noisy perceptrons and analyze mutual learning in 
discrete TPMs. 
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In cryptography, one of the most important aspects 
of the channel is its security. Therefore, potential al- 
gorithms of eavesdroppers are included in our analy- 
sis. Such algorithms are actually sophisticated learning 
procedures where the parties are the teachers and their 
weights are time dependent, and the eavesdropper is the 
student. In the following we name this time-dependent- 
teacher-student scenario dynamic learning. 



In this Paper we analyze mutual learning and dynamic 
learning in TPMs of two kinds: machines with continuous 
weight vectors (the spherical constraint - see Eq. Q be- 
low) and with discrete weight vectors and finite increment 
(see Eq. (||) below). We introduce a method that maps 
mutual learning in two layered parity machines onto mu- 
tual learning in noisy perceptrons. The spherical tree 
parity machine is studied using the same tool box used 
for studying mutual learning in the perceptron ||. The 
interesting behavior of full synchronization for a certain 
regime in the learning rate space and partial synchro- 
nization in the other regime is also found in the mutual 
learning of TPMs. Mutual learning in a TPM when the 
weight vectors are continuous is described by equations 
of motion that reveal the evolution of the order parame- 
ters in time. The derivation of the equations of motion is 
based on the assumption that the order parameters are 
self-averaging quantities [l9) . This assumption is vi- 
olated when the increment of the weight vectors in each 
step is finite and not infinitesimally small, as in the case 
of the discrete weight vector studied here. Therefore we 
develop different analytical tools for the case of discrete 
weight vectors. 



This Paper is an extension of It contains a full, 
detailed description of the analytical methods and dis- 
cussions that were not included in |fTofl . An advanced 
attack suggested recently by Shamir et al || - the flip- 
ping attack - is also analyzed. The paper is organized 
as follows: in section |l| we introduce the TPM model. 
We employ a general fr amew ork to present its applica- 
tion to Cryp togr aphy in II A. The dynamics studied are 
presented in [IB and the orde r par ameters and local field 
distributions are discussed in II C. The mapping proce- 



dure is detailed in |I|. The learning in continuous TPMs 
is given in [V, wh ere w e divided the section into mutual 
learn ing (section IV A ), and dynamic learning (section 
IVB ). The secti on is summarized and the results are 
discussed in [VC. Discrete learning is presented in sec- 
tionjv. We first describe mutual learning in perceptrons 
in V A . The exte nsio n to mutual learning in parity ma- 



chines is given in |VB| . Two dy nami c learning attacks are 
studied, t he na ive attacker (in VC), and the flipping at- 
tacker (in [VD|). A discussion and an overview are given 



in |VE| All analytical results are found to be in good 
agreement with simulation results as indicated in each 
section. 
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Figure 1: A tree parity machine N : 3 : 1 



II. THE MODEL 



We consider a TPM with K binary hidden units Ti 

= 1,...,K feeding a binary output, a — X^ =1 T t) 



±1, 

see Figure |l|. The networks consist of either a continu- 
ous or a discrete coupling vector w t = Wu, Wni and 
disjointed sets of inputs = Xu, X^i containing N 
elements each. The input elements are random variables 
with zero mean and unit variance. We confine the input 
components to Xji = ±1 without losing generality. The 
local field in the zth hidden unit is defined as 
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(1) 



and the output in the ith hidden unit is derived by taking 
the sign of the local field. The output of the tree parity 
machine is therefore given by 



A" 



A" 



Our analysis is limited to TPMs with three hidden units, 
K = 3, merely for simplicity of the representation of the 
analysis. The extension of the formalism to any number 
of hidden units is straightforward. 

The weight vectors of the TPMs are initiated at ran- 
dom according to a certain constraint. We studied two 
different cases: the case when the weight vectors are con- 
fined to a sphere, 



N 



(2) 



and are initiated randomly according to a Gaussian dis- 
tribution; and the case when there are a finite number 
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of available integer values that each component of the 
weight vector can take, 



Wji =±L,±(L-1),...,±1,0, 



(3) 



and the weight vector components are initiated at ran- 
dom from a flat distribution with equal probability for 
each value. These two scenarios are referred to as the 
continuous case and the discrete case. 

We studied the mutual and dynamic learning of such 
TPMs in various scenarios where the initial random se- 
lected weight vector is the unknown secret information. 
Two machines A and B, perform mutual learning and 
try to synchronize by updating their weights according 
to the match between their output such that at the end 
they achieve full synchronization. The third machine, C, 
performs dynamic learning by trying to learn the weight 
vectors of one of the two machines, say A, and uses an 
attack strategy to update its weight vectors such that 
at the end of the procedure they will be identical to the 
weight vector of player A. The application of these pro- 
cedures to the field of Cryptography is discussed in the 
following section. 



Cryptography Based on Synchronization: 
General Framework 



Before we develop the detailed equations for mutual 
learning in TPMs, we introduce the general concept 
of synchronization and learning in discrete parity ma- 
chines in terms of a mean-field-like approach, and dis- 
cuss the qualitative ability to construct an ephemeral 
key-exchange protocol based on mutual learning between 
TPMs. 

First, let us consider two parties A and B who wish to 
agree on a secret key over a public channel. The weight 
vectors, wf^ B , are the parameters of each unit which 
are changed during the training procedure. Both parties 
start with secret initial parameters w which may be gen- 
erated randomly. After a number of training steps, the 
set of parameters is synchronized and becomes the time- 
dependent common key. At each training step a common 
random input Xj is generated for both of the parties; it 
is public and known to possible eavesdroppers. 

Each party of the secure channel consists of three hid- 
den units with corresponding three parameter vectors. 
For a given input Xj each party calculates an output bit 
a A l B and sends it over the public channel. A training 
step is performed only if the two output bits disagree and 
only for the hidden units which agree with their output 

Aw^ 5 = g (V^x.) 6 {-cj a cj b ) 6 (a A/B T? /B ) , (4) 

where g is an odd function. As an example consider the 
following configuration of the hidden units: + + + for 
TPM A and — h + for TPM B. The output bits have 



units according to x,-, while B changes only the weight 
vector of its first unit according to — x». 

Synchronization between the two machines indicates 
a full anti-parallel state where each machine produces 
exactly the opposite output of the other for any given 
input. The success of synchronization can be measured 
by the probability of an incoherent state, i.e., the proba- 
bility of having the same output instead of the opposite 
one. The probability for an incoherent state, e m , that two 
corresponding hidden units are mistaken and instead of 
producing exactly the opposite output they agree on a 
random input, is given by 

e m = Prob (r A (x 4 , wf) = r B (x. ( , wf )) . (5) 

The function g used for training must be chosen so that 
on the average (over random input) e m is decreased. In 
this section we simplify the presentation by assuming 
symmetry among the three hidden unit, e|" = e m . The 
full detailed description of the dynamical process beyond 
this mean-field-like framework is given in |v|. 

It is now easy to see that as soon as the TPMs 
are synchronized they will remain synchronized, i.e., if 
wf = — wf for all i, then a A = —a B and will remain so. 
A training step in a unit i is performed only if both out- 
put bits disagree and if the two Tj disagree accordingly. 
Hence, after the synchronization state is achieved they 
either perform a coherent training step or they do not 
change their parameters (referred to as a quiet step). A 
pair of synchronized hidden units performs a kind of ran- 
dom walk in parameter space but remains synchronized. 

This is different when the two hidden units are not 
identical. Let us consider the first hidden unit, where 
there are four distinct cases: 

(a) a A = a B : nothing moves and the next step is 
performed. 

(b) t a = a A , t b — cr B , <7 A = -<r B : both parameter 
vectors w A and wf are coherently changed. 



(c) r A 



o A , T 
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B ^ o B 

B. 



-a": only one parameter vector is 
changed and moves incoherently, hence e™ increases. 

(d) t a ^ a A , t b ^ a B , a A = -o B : both parameter 
vectors are not changed. 

The probability of finding these four cases can be calcu- 
lated from the knowledge of e m . For example, the prob- 
ability of finding the configuration shown above, + + + 

and V +, is | (l — e m ) (e m ) . All 64 configurations 

can be divided into three categories: the probability of 
having an attractive step, p a (case (b)); the probability 
of having a repulsive step, p r (case (c)); or the probabil- 
ity of having a quiet step, p q (cases (a) and (d)). These 
probabilities are found to be 



1 



1 - e l 



p r = 2(1- t in ) (e in ) 



+ (1 - e m ) (e m ) 

2 

j Pq = 1 - Pa ~ Pr 



(6) 



the values a J 



1,(7 



B 



1 . Hence A trains all of its 



In the remainder of this section the three probabilities 
above are employed in order to explain the synchroniza- 
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tion phenomenon, and to demonstrate the superiority of 
the synchronization process over a possible attacker that 
also tries to synchronize with A and B. 

Close to synchronization, e™ ~ 0, the probability of 
having a repulsive step is proportional to p r ~ (e m ) 
whereas the probability of having an attractive step is 
Pa ~ \ (quiet steps are always possible). Let us assume 
that the change of the error, e m depends only on a func- 
tion of e m itself. Later we will derive the exact equations, 
which are more complex. Then, the average change in e m 
in one step is obtained by 



Ae 



Pa 



re 



l )Pr- 



(7) 



Close to synchronization a repulsive step affects all of the 
parameters while an attractive step can only synchronize 
the few parameters which are not yet identical. Hence 
we expect for small values of e m : 



a e 



ro- 



(8) 



Therefore, in the leading order one obtains Ae cx Ooe m . 
Close to synchronization the attractive force is dominate, 
independent of the detailed mechanism of learning. The 
parity machine suppresses the repulsive steps by reducing 
their appearance frequency. 

This relation does not hold for the committee machine 
which maps the hidden units to their majority vote, a — 
sign (n + r 2 + t 3 ) [^o[ ^l). For this case one finds 

Pa = 1(1- e m f + (1 - e m ) 2 (e in ) + ~ (l - e m ) (e m f ,(9) 



Pr 



Now, close to synchronization p r ~ e m and repulsion and 
attractive forces are of the same order, Eq. (Q). This 
competition between attraction and repulsion supports 
possible attackers, as discussed below. 

Let us go back to the parity output and consider an 
attacker C who knows all the details of the algorithm 
and can listen to the communication between A and B. 
We know that the initial configurations of the parame- 
ters of A and B are unknown. The attacker C has the 
same architecture (TPM), the same number of hidden 
units (3) and uses the same learning algorithm, Eq. (^). 
What is a good algorithm for C to synchronize, i.e., to 
learn A and to be anti-parallel to B? If C is synchro- 
nized then she should remain so. Hence she should use 
the identical training step in case of agreement with A. 
Let us consider an attacker C who simulates party A 
after synchronization between A and B is achieved. C 
uses the complete algorithm explained above for party 
A. This means that A always makes some moves of her 
parameters while C moves her parameters corresponding 
to the units whose output bit rf are identical to a A (in 
the following we named this attack the naive attack - see 
[VCj) . This strategy for C generates many repulsion steps 
between C and A. In fact, assuming the error between 




Figure 2: The ratio between p r and p a as a function of e" 1 in 
the case of mutual learning in TPMs, Eq. ([]) (solid line) and 
in case of the naive attack, Eq. (|I7J) (dashed line). 



all matching units is the same, e m = Prob (rf ^ r/ 1 ) 
(where we use the same symbol for e m as in Eq. (H), al- 
though se emin gly different, in both cases it refers to the 
error, see IIC and Eq. ( |17| ) below) and summing up all 
possibilities yields 



Pa = \ (1 - e m ) 3 + \{1~ e m ) (e m ) 2 + (1 - e m ) 2 ^%L0) 



2 

p r = (l-e m ) 2 e m 



2 1 



m ) (e m ) 2 + (e™) 3 . 



The essential difference between party A and attacker C 
is that the probability of finding a repulsive step scales 
with (e m ) in the mutual learning between A and B and 
scales with e m in the dynamic learning between C and A, 
close to synchronization. A and B react to their mutual 
output while C cannot influence A; this yields a different 
behavior for small values of the error e m . 

The full scheme of the ratio, p r /p a , derived from Eqs. 

fand ( [Tol ) as a function of e ln is presented in Figure 
It is clear that at any value of e m the performance 
of the mutual learning is better than the performance 
of the naive attacker that performs many more repulsive 
moves compared to hers attractive moves. Therefore, 
a more sophisticated attacker was recently suggested in 
H - the flipping attacker. Hers performance cannot be 
measured in the scope of this general framework since 
hers strategy depends on the local fields in the hidden 
units and therefor can not be included under the rubric 
of Eq. (JH), where g depends only on ctx^ . 

In the following, before delving into details we intro- 
duce the dynamic (Eq. (Q)) more specifically. We discuss 
some of the relevant order parameters and their distribu- 
tions. We present the strategy of the flipping attacker 
and an intuitive explanation for her success. 
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B. The Dynamics 

In principle, one can consider the following classes of 
dynamics that lead to a synchronized state: 

(A) The parties update their weight vectors whenever 
their outputs mismatch (a A 7^ <r B , as appears in Eq. 
([|)), and each unit updates according to the input mul- 
tiplied by the opposite of its output. 

(B) The parties update their weight vectors whenever 
their outputs mismatch {a A 7^ <r B , as appears in Eq. 
(f|)), and each unit updates according to the input mul- 
tiplied by its output. 

(C) The parties update their weight vectors whenever 
their outputs match (a A — <J B ), and each unit updates 
according to the input multiplied by the opposite of its 
output. 

(D) The parties update their weight vectors whenever 
their outputs match (a A — <J B ), and each unit updates 
according to the input multiplied by its output. 

In all the dynamics mentioned above, the ith hidden 
unit is updated only if it matches the overall output in 
that party, if n — o. The two parties that try to syn- 
chronize might end up in an anti-parallel state (cases (A) 
and (B)), or in a parallel state (cases (C) and (D) ). Al- 
though Eq. (Q) does not describe cases (C) and (D), the 
discussion in section [I A is relevant to all cases. 



In this Paper we introduce a detailed presentation of 
case (A). In each step an update is made only if both 
machines, A and B, disagree, a a 7^ &b , and each unit 
updates according to the input multiplied by the opposite 
of its output. In the spherical case we normalize the 
weight vector after each updating such that its norm does 
not change. The dependence of the weight vector in a 
new step on the weight vector in the former one in the 
continuous case is 



a A a B )9(a\ B )<j B 



a A a B )B{a A T B )a B \ 



(11) 



w 



B+ 



wf + ^e{-a A a B )e{G B Tf)a J 
wf + ^9(-a A a B )9(a B T B )^ 



where 9{y) is the Heavyside function, i.e., equals zero 
for y < and 1 otherwise, r\ is the learning rate and 
i = 1,...,K, The analysis of the dynamic is in the ther- 
modynamic limit where N — > 00 and the weight vectors 
are updated by an infinitely small quantity in each step. 

In the discrete scenario, the update is made in a similar 
manner, yet there are two important differences from the 
dynamics point of view. One is that in each step the vec- 
tors' components are changed to the next integer value 
and not by an infinitesimally small one as in the contin- 
uous case (Eq. (|TT|)), The second difference is that when 
there is an update, the components which have reached 
the boundary value Wi — ±L , and their absolute value 
should be increased W f + = ±(L + 1), are not changed, 
and remain with the boundary value. Mathematically, 



the learning is phrased as follows 

wf + = wf + D(w A ^a B ) Xl a A 9(a A T A )9{- ( j A a B ),(12) 
wf + = wf + D(w B Xi a A )^a A 9(a B r B )9(-a A a B ), 

where D(y) = 1 — 5l iV and S is the Kronecker delta func- 
tion. 



C. Order Parameters and Joint Probability 
Distributions 



The analysis of learning in neural networks with an in- 
finite number of weight vector components is based upon 
statistical mechanics analysis of several order parameters. 
The standard order parameters used are 



Q'i 



1 



N/3 
1 

N/3 



(13) 



where the index i represents the zth hidden unit, i — 
1,...,K and m,n denote the specific party, m,n £ 
{A, B, C}. The angle between each pair of weight vectors 
9, is given by the normalized overlap between the weight 
vectors 



Pi =cos9 i 



(14) 



We assume that there are no direct correlations between 
different hidden units due to the tree architecture and 
therefore the overlaps between different units is zero. 

In the framework of statistical mechanics analysis of 
on-line learning the order parameters play an impor- 
tant role in taking the averages over the random in- 
puts, or equivalently over the local field distribution. 
According to the central limit theorem, the joint prob- 
ability distribution of the local fields in each triplet of 
matching hidden units taken from the three different 
machines depends only on the set of order parameters, 
P(h A , h B ,h c \ {R, Q}) (where we omitted the subscript i 
from all parameters) and can be found from the corre- 
lation matrix. When all weight vectors are normalized, 
Q m = 1, it is found to be 



where F 



(h c ) G c 



ex P(-^g) 
(2tt) 3 / 2 V^' 



(h A ) 2 G A 



(15) 



G 1 



2h A h B D c - 2h A h c D B - 2h c h B D A 1 E = 1— (p A - B Y - 



\P—) - \.P~'~) + 2p A ' B P A ' C P B ' C ', G k = (1 - / m ) 2 , 
D k = p Lm - p k - m p k - 1 and k, l,m £ {A, B, C}. This com- 
plicated expression can be much simplified if we assume 
that the two machines, A and B, are already anti-parallel. 
In that case the joint probability distribution of the local 
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fields is given by 



P 



1 (hC) 2 +(h A y-2h A hC P ^ 

•3 i-(pA.C)^ 



27iV 1 - P A ' C 



(16) 



where 5() stand for the Dirac delta function. 

At this stage it is po ssible to calculate the probabilities 
defined in section II A and to show that indeed e m has the 



same meaning and the same dependency on p in the two 
cases: Eq. (||) and later when the attacker is introduced. 
Averaging over the local field distributions results in the 
case of mutual learning in e m = 1 — i cos -1 p A ' B and in 
the case of dynamic learning we find e m = ~ cos -1 p A - c . 
In order to compare these two errors, where in the first 
one learning is described by negative p and in the second 
by positive, we define p = \p A ' B \ = \p A ' C \. Substituting 
p into both functions above, we get 



1 



e m = - cos" 1 p. 

TT 



(17) 



We present in this Paper a flipping attacker, which 
makes use of the absolute value of the local field. The 
attacker estimates that the unit with the smallest abso- 
lute local field is the one that is most probably wrong - 
that has different outputs, rf ^ r A . The origin of this 
assumption can be easily explained by averaging over the 
local field distribution. The average of the absolute value 
of the local field, given an overlap p A ' C between 

two matching hidden units and norm Q c of the weight 
vector in this unit is found to be 



(\h C \) = -A^(l±p A ' C ) 

Xl 17 2 V 2tt V H 1 



(18) 



where the sign in the right hand-side of the equation is 
plus for agreement between the outputs and minus for 
disagreement. Since p varies between —1 and 1 and in a 
state of partial learning < p < 1 , a small absolute local 
field signals a mistake in the unit's output. The flipping 
attacker uses this kno wledg e in her learning procedure, 
as discussed in section |V D| . 

The analytical study of this attacker includes averages 
over probability distribution of the local field in the third 
party, the attacker C, given the local fields of the two 
machines. This probability is given by 



P(h c \h B ,h A ,{p,Q}) 



P(hV,h a ,h c \{p,Q}) 
P(h A ,h B \{p,Q}) 



(19) 



where P(h c , h B , h c \ {p, Q}) and P{h c , h B \ {p, Q}) are 
the joint probability distributions of the three local fields 
and two local fields respectively, and they are derived 
from the correlation matrix similar to Eq. (|l5|). 

In the discrete case, when the increment is finite (see 
for instance Eq. (113)) , the above order parameters no 



longer suffice for the macroscopical description of the dy- 
namics even in the thermodynamic limit, N — > oo. How- 
ever, the distributions above do hold. The dynamic can- 
not be analyzed with the standard equations of motion 
based on differential equations of the order parameters 
with respect to a, the number of examples per input di- 
mension. We introduce a generic method for analyzing 
the discrete case by extending the macroscopical param- 
eters and deriving macro-dynamical updating equations 
(see section |vj). 



III. MAPPING PROCEDURE 

One can map mutual learning in the parity case onto 
mutual learning in K perceptrons. The mapping to noisy 
perceptron introduced for analyzing on-line learning in 
TPM is inadequate in the case of mutual learning 
where the updating depends on the matching between 
the outputs but is independent of their specific sign. Nev- 
ertheless, a different mapping from TPM to noisy per- 
ceptrons can be used for the mutual learning case. The 
mapping presentation is much simplified in the continu- 
ous case since assuming random initial conditions to all 
hidden units results in the same overlap for all hidden 
units, pi = p Vi. Therefore, we first assume that all the 
overlaps between matching hidden units are the same. 
Hence, updating K perceptrons is equivalent to one up- 
dating in the TPM. The presentation of the mapping 
below is simplified by the restriction of K = 3 and the 
generalization to any K is straightforward. 

We have TPMs that consist of non-overlapping recep- 
tive fields with random inputs. Hence in each of the 
TPMs all 8 internal representations appear with equal 
probability. A specific hidden unit is updated when the 
following two conditions are fulfilled; (a) there is a mis- 
match between the results of the two TPMs, and (b) the 
state of the hidden unit is the same as the output of the 
TPM. We make use of e, the probability of having dif- 
ferent results in the two hidden units that the overlap 
between them is p and is given by 



1 



e = — cos 

TT 



(20) 



We concentrate on a specific pair of matched hidden 
units. Given that the outputs of the hidden units are 
different, there is a probability, Pi, that the TPMs re- 
sults are different and in one half of the cases the TPM 
output has the same output as its hidden unit and there- 
fore both hidden units in both machines are updated. 
This probability is given by 



P 1 =P(a A ^a B \T l A ^T B ) = e> + (l-ef 



(21) 



Similarly, the probability that there is a mismatch be- 
tween the two TPMs given that there is agreement be- 
tween two hidden units, is given by 



P 2 = P{a A ? a 



B W A 



! ) = 2 e (l- £ ). (22) 
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In this case only one of the hidden units has the same 
sign as the output in its TPM and only that hidden unit 
is updated. 

These probabilities are introduced into the updating 
procedure of the hidden units - the perceptrons. In the 
continuous case they affect the form of the equations of 
motion (see Eq. (p3|)). In the discrete case they are 
introduced in a different manner, as described in section 

IV. CONTINUOUS TREE PARITY MACHINES 

Counting on the mapping procedure described above, 
mutual and dynamic learning in continuous TPMs can 
be mapped onto learning scenarios in continuous percep- 
trons. The updating rule can be redefined so that it will 
be suitable for a perceptron where the kind of upda ting 
depends on the above probabilities, Pi and P 2 , Eqs. (|21) 
and (H|). The standard on-line equations consist of an 
average over the order parameters ||], and now contain 
additional random variables. The average over these ad- 
ditional variables is taken by introducing auxiliary ran- 
dom parameters, as described in the following section. 




0.5 1 1.5 2 23 3 



Figure 3: The fixed point p/ as a function of rj for the con- 
tinuous TPM as obtained from the solution of Eq. ( p5| ) (solid 
line). Simulation results in some instances of r\ are presented 
by stars. Inset: Analytical (solid lines) and simulation results 
in the case of r\ = 2 (triangles) and r\ = 3 (circles) for (p) as a 
function of a. All simulations are carried out with N = 5000 
and averaged over 20 samples. 



A. Anti-Parallel Learning 



In this scenario the updating rules of the TPMs are 
given in Eqs. (|ll] ) where we have three hidden units, 
K = 3. Mapping the rules onto a perceptron learning 
by employing the probabilities above is done by intro- 
ducing auxiliary random parameters, p a , pp, p 7 , which 
are equally distributed between and 1. The updating 
rule is calculated as a function of these parameters in the 
following manner, 



In order to calculate the equations of motion, one has 
to multiply the updated vectors, Eq. (p3|), first, and 
then to perform the two averages; average over the joint 
probability distributions of the local fields and over the 
random parameters, p a , pp and p 7 . The result of these 
two averages is an equation over the normalized overlap 
p, that depends only on p or equivalently on the angle, 
0, (see Eq. ©) ' ' 



w 1 = 



where 



^A A \ 



| w b + ^ xt a Ab |' 
(23) 



A A = 6(-t a t b )0( 



A, 



T A T U )6{ 



Pi 



Pi 



dp ^ 6 
-T- = rj[— + (!--) 

da 7T Z 7T 

277 



' (1"P)-|Wp) (24) 



)9{P 2 -p )9{--p 1 ) 



(i-p^-(i--)-^p-(i--y, 

2ir IT IT IT IT 



p a )-9(T A T»)6(P 2 -p )6(p 7 --) 



The introduction of the auxiliary random variables is 
done according to the following logic: in one half of the 
cases of disagreement between the units and disagree- 
ment between the TPMs, no update occurs in the units 
(since their sign does not match the TPM's sign) and 
hence Pi is divided by 2 in the equation above. The 
second scenario where updating occurs is when the units 
have the same sign, the TPMs disagree and therefore 
one of the units is updated and the other is not. The 
auxiliary random number p 1 is the one that determines 
(randomly) which unit of the two is updated. 



where a is the number of examples per input dimension. 
The points p = ±1 are fixed points of the equation of mo- 
tion above. Both are repulsive when the learning rate, 
77, is small. As soon as 77 > r\ c ~ 2.68 a phase transition 
occurs, the p = — 1 fixed point becomes an attractive one 
and a new phase arises, where the two machines are fully 
synchronized. The asymptotic decay of p to synchroniza- 
tion scales exponentially with a, as can be found by ex- 
panding the terms in Eq. ( p^ ) around 9 — it. Apart from 
the fixed points discussed above, for any 77 smaller than ?7 C 
there is a different attractive fixed point, as can be found 
by solving numerically Eq. (|J). The fixed point 9f{pj) 
is the exact angle(overlap) in a specific learning rate, 77, 
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in which the right hand side of equation becomes zero: where 



^sin^l-i^) 



2D, 



(1 + COSfl/)^ + (1 - + 20080/(1 - 

(25) 

In Figure || we plotted the fixed points as a function of 
i], as was found numerically from Eq. (p5|). Simulation 
results for spherical TPMs with N = 5000 and averaged 
over 20 samples are in agreement with the analysis as in- 
dicated by the few tested cases presented by the symbols. 
Clearly, the system undergoes a phase transition from a 
partial to a perfect anti-parallel state at rj c ~ 2.68. One 
instance for each of the phases is given in the inset of 
Figure ||. The development of the averaged (p) , averaged 
over the three hidden units and 20 samples, in the case 
of partial mutual learning, f] — 2 (triangles), and the 
case of anti-parallel synchronization, r] — 3 (circles), as a 
function of a is presented in the inset of Figure ||. Nu- 
merical calculations of the analytical equation, Eq. (24), 
are presented by the solid lines. 



B. Dynamic Learning 

In the last section we show a procedure that leads to 
full synchronization. In the following we check the ability 
of a third TPM, an attacker, to learn the weight vectors 
of the two parties. The third machine, C, that tries to 
imitate A, updates its weight vector only when the two 
parties are updated and only the hidden units that match 
the output of party A. Mathematically, this is defined as 
follows 



wf + ^9(-a A a B )9(a A r^ ) 
|wf + % Xi 9(-a A a B )9(a A Tf ) 



(26) 



Continuing the same line of introducing probabilities in 
the mutual learning procedure, one can write a set of 
updating rules for the dynamic and mutual learning in 
perceptrons which is equivalent to TPMs learning. This 
is given by 



V_ B X 



W A + 2L X T B A A 



(27) 



w c + = 



w 

C 



B 



N 

W +§X.T B A C 



XT A A B 

B 



A A = 9(-t a t b )0(P 1 


-P a )9{- 


~Ps) 


+9(t a t b )9(P 2 - 




-Pi), 


A B = 9(-t a t b )0(P 1 


1 

-p a )e{- 


-ps) 


+9(t a t b )9(P 2 - 


-Pp)0(p-y - 


1 

-J). 


Ac - 9(-t a t b )9{t a t c )9{P 1 


1 

-Pa)9(- 


-ps) 


+9(t a t b )9(t a t c )0(P 2 


~PP)0{\ 


-Pi) 


-9{-t a t b )9{~t a t c )9{P 1 


~Pa)9(ps 


-5) 


+9{t a t b )9(-t a t c )9(P 2 - 


-Pp)0(P<y - 


-b- 



We introduce another random parameter, ps, which is 
redundant when one calculates only the mutual learning, 
Eq. (|23]), and it is necessary for deriving equations of 
motion for the order parameters in the case of dynamic 
learning. The four terms in Ac represent the four pos- 
sibilities that cause an updating in the attacker hidden 
unit. For instance, the first term of Ac represents the 
case where the hidden unit in the attacker and in the first 
TPM have the same state, the TPMs' outputs are differ- 
ent (indicated by P\ ) and the outputs in the hidden units 
of A and B are the same as their TPMs, (the probability 
for such an event is %). 

The equation of motion after synchronization, i.e., 
when pa,b = — 1, Pa,c = —PB,c, is derived by aver- 
aging Eqs. ( p7| ) over the joint probability distributions 
that is given in Eq. ( [l6| ) . It depends on the learning rate 
and the overlap pa,c and is given explicitly by 



dpA,c 
da 



2 



1 



1 



PA,C - PA,C 



(28) 



w 



c 



%*T B A 



This equation describes the development of the overlap 
between the attacker and one of the two machines that 
are synchronized in both cases, when each machine learns 
the opposite of its result, Eq. ( P6[). 

As can be derived from Eq. (|28D , independent of the 
learning rate, i), there is a unique fixed point pf ~ 0.79. 
The point p = 1 is not a fixed point at all. Note that this 
fixed point describes only the failure of the continuous 
attacker, the equivalent discrete attacker might synchro- 
nize and gain p = 1, as discussed in section V C . In 
Figure § we present analytical (solid lines) and simula- 
tion results (symbols) for the overlap between that at- 
tacker and player A, pac- We carried out simulations 
with N = 5000, and each result averaged 20 times. A 
good agreement between simulation results and analyti- 
cal results is presented in Figure |] in both cases; when the 
overlap is initialized zero, pac — and in the inset, when 
the initial value of the overlap is almost 1, pac = 0.98. 
All results are for full synchronization between A and B, 
Pab = -1. 
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Figure 4: The analytical curve of the averaged overlap, (p), in 
a dynamic learning of TPMs as obtained from Eqs. ( pq ) (solid 
line), with r\ = 10. The initial state is p = 0. Inset: Analytical 
results for the dynamic learning with the initial state p = 0.98. 
Symbols represent the corresponding simulations, carried out 
with N = 5000 and averaged over 20 runs. 



C. Summary 



V. DISCRETE MACHINES 

The study of discrete networks requires different meth- 
ods of analysis than those used for the continuous case. 
We found that instead of examining the evolution of R 
and Q, we must examine (2L+ 1) x (2L + 1) parameters, 
which describe the mutual learning process. By writing 
a Markovian process that describes the development of 
these parameters, one gains an insight into the learning 
procedure. Thus we define a (2L + 1) x (2L + 1) ma- 
trix, F M , in which the state of the machines in the time 
step n is represented. The elements of F, are f qr , where 
q,r = —L, ... — 1, 0, 1, ...L. The element f qr represents the 
fraction of components in a weight vector in which the 
A's components are equal to q and the matching com- 
ponents in d unit B are equal to r. Hence, the overlap 
between the two units as well as their norms are defined 
through this matrix, 

L 

R = J2 (29) 
q.r— — L 
L L 

Q A =Y. Q B = E ^Ur- 

q——L T——L 

The matrix elements are updated, if and only if, an 
update of the weight vectors occurs. 



In summary, we showed that an initiated pair of ran- 
dom TPMs that perform mutual learning results in a full 
synchronization state for r\ > r\ c . We introduce here a 
specific dynamic where the parties update only in a mis- 
match between the outputs, the updating is in opposite 
directions of eac h oth er and they are normalized in each 
step (case A in II B ). Analyzing case B, for instance, 
reveals that for all rj, the stationary solution is a syn- 
chronized state. Using the dynamics appearing in LIB 



but without normalizing the weight vectors does not end 
in a synchronization state at all. The specific algorithm 
we chose contains the reach phenomenon of phase tran- 
sition J23). Moreover, its synchronization abilities are 
closely related to the discrete synchronization studied in 
the following section. 

The attacker tries to learn the parties' weight vectors 
but manages to achieve only partial success. This diffi- 
culty in learning that such a naive attacker faces as in- 
dicated by the fixed point that differs from 1, also char- 
acterizes the naive attacker in the other cases presented 
in [IB . However, the analysis is not relevant for the dis- 
crete case studied below. In the discrete case the naive 
attacker performa nce i s restricted too but perfect learn- 
ing is possible, see V C . T he fli pping attacker that makes 
use of the local fields (see VP ) has a better performance 
in the discrete case. An open question which deserves 
further research, is how to analyze the continuous flip- 
ping attacker. 



A. Learning with Discrete Perceptrons 

The mutual learning scenario is much simplified in the 
case of the perceptron, therefore we present here the full 
description of the analytical procedure used for this case. 
Updating is done in the case of a mismatch, and the aim 
is to arrive at a state in which the weight vectors are 
anti-parallel, p = — 1 (we could aim at p = 1 instead, see 
the manifold of possible dynamics in II A , and the results 



would be equivalent) . The dependence of the weight vec- 
tor in a new step on the weight vector in the former one 
is given by: 



i)(wfx ll T a )x !( T H fl(-/ ( T B ), (30) 



wf + = wf + £>(wf Xi a A )^a A e(-a A a M ), 

where a A l B represents the output of TPM A/B, and 
w a/b re p resen t s its weight vector. 

The update of the elements of the matrix F, is cal- 
culated directly from Eq. (j3(i|), where one must average 
over the input components Xy. On the average, half 
of the updated weights in one machine are increased by 
1, while the matching weights in the other machine are 
decreased by 1 and vice versa. 

The possibility for agreement /disagreement between 
the parties is a function of the current overlap between 
them, calculated using the matrices (see Eq. (p9|)). This 
probability is implemented by choosing a random param- 
eter, p a between [0,1]. If it is smaller than e, as defined in 
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Eq. (P0|), the parties disagree, otherwise they agree. The 
updating of matrix elements is described as follows: for 
the elements with q and r which are not on the boundary, 
(q 7^ ±L and r 7^ ±L) the update can be written in a 
simple manner, 

fq.r = 9(p a - e) f q , r +0 (e - p a ) Q/g+l,r-l + ^/g-l.r+l 

(31) 

For elements with both indices on the boundary, the up- 
date is 



fix 

f-L-L 



f —L.L 



0{p a -e)/i,L, 
9(p a -e)f- L ,-L, 



(32) 



>(p a -e)l^-fL,-L)+0(£- Pa )x 

-^fh-l-L+l + ^fL-l-L + -^fL.-L+l 
' (Pa ~ e) f—L,L + (e - p a ) x 



For elements with just one of the indices on the boundary 
(q = ±L and r 7^ ±Z or vice versa), the update is 

fi L = Hp a -e)f q . L + (33) 

( e - -Pa) Q/«+l,L-l + ^fq+l,L 

°i e - Pa) ,-L + l + , 

/t,r = ^ (Pa - e) /i,r+ 
(e - Pa) Q/i-l/r+l + ^/L,r+1^ , 
f-L,r = 6 (Pa ~ e) /-L,r+ 
(e - Pa) Q/_i + l, r -l + ^f-L.r-1 

The main quantity of interest is the number of steps 
required in order to arrive at a state of full synchroniza- 
tion. In simulations there is a discrete transition from 
an overlap which is almost anti-parallel to a completely 
anti-parallel state. This is due to the finite nature of the 
vectors, the largest value of overlap before synchroniza- 
tion is —1 + 0(1/N). In simulations with N — 10 4 , for 
example, the largest value of the overlap before full syn- 
chronization is p — 0.99999, and this is the value we used 
in our analytical procedure, for defining full synchroniza- 
tion for comparison to simulations with N = 10 4 . 

Our results indicate that the order parameters are not 
self-averaged quantities Jll|. Several runs with the same 
N, results in different curves for the order parameters as 
a function of the number of steps, see Figure ||. This 
explains the non-zero variance of p as a results of the 
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Figure 5: The averaged overlap (p) and its standard devia- 
tion as a function of the number of steps as found from the 
analytical results (solid line) and simulation results (circles) 
of mutual learning in TPMs. Inset: analytical results (solid 
line) and simulation results (circles) results for the percep- 
tron, with L = 1 and N = 10 4 . 



fluctuations in the local fields induced by the input even 
in the thermodynamic limit. 

In the inset of Figure || we present the averaged numer- 
ical results derived from the analytical equations, ( plj , |3^ . 
|33| ) of synchronization in the perceptron (solid line) with 
L = 1, Wi = ±1,0, . The analytical results are aver- 
aged over 500 samples and the non-zero standard devi- 
ations are not presented in order to simply the presen- 
tation. Simulation results with L = 1 (Wi — ±1,0) and 
N = 10 4 , averaged over 500 samples are presented by 
the circles; error bars are standard deviations. Note that 
even though the matrix elements were initiated with the 
same values in each run, there is still a non-zero stan- 
dard deviation due to fluctuations in the local fields as 
a function of the particular set of random inputs even in 
the thermodynamic limit. 

For the perceptron, synchronization is much easier and 
faster to achieve than for the TPM. Take for example the 
case where L — 1. If for three consecutive steps, both the 
other party's output and Xi were positive, an attacker 
can surely know that Wi = 1, while this is not so in the 
TPM the attacker cannot know for sure whether 

the unit was updated or not. Therefore, the TPM is 
much more suitable for building a cryptosystem than the 
perceptron. 



B. Synchronization in TPMs 

Mutual learning in discrete TPMs is described by mu- 
tual learning discrete noisy perceptrons. As the TPM 
consists of three hidden units (each evolving differently) , 
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we now have three different angles, 9i where i = 1,2,3, 
for each hidden unit. Since the dynamics are not self- 
averaged, we use probabilities similar to those introduced 
in Eq. (|2l] ) . The definitions of these probabilities are ex- 
tended to include all three hidden units, and each one is 
characterized by its own angle, P[, P%, The probability 
of Pi(<r A ? <J B \r t A + t? ), is given by 



^ = £^ + (1-6^(1-6^. 



(34) 



Similarly, the probability that there is a mismatch be- 
tween the two TPMs given that there is agreement be- 
tween the zth pair of hidden units, for instance, is given 

by 



^ = 6,(1-6,)+ 6,(1 



(35) 



Here, as well as in the continuous case, we chose a se- 
quence of random parameters to represent the particular 
choice of random inputs. 

We follow each hidden unit separately and therefore 
we have three matrices, F\ We initialize the weights 
randomly, therefore the matrices in the initial state have 
the values of 1/(2L + l) 2 in each entry. In each step, two 
sets of random parameters are chosen and are used to 
set a specific realization of the internal presentation for 
the parties. The first set is used to define agreement or 
disagreement between each pair of hidden units, as done 



in the perceptron case V A 



All in all, due to inversion symmetry, when K = 3 
there are four possible results for the internal presenta- 
tions, + + +,H , — I — or h and accordingly 

4x4 possible states, for which the parties' output does 
not match, and an update is performed. We then use 
the second set of random parameters for defining the 
specific internal presentation in one of the TPMs, and 
therefore immediately in the other, according to their 
agreement / disagreement. 

The case when the three hidden units disagree is ex- 
emplified below. There is a possi bility that all hidden 
units are updated, (case (b) in II A), or only one of them; 
(case (b) describes two of the hidden units and case (d) 
describes the third) . In two of the eight such internal pre- 
sentations all the three hidden units are updated whereas 
in the other six, only one of them is updated, so that 
we must choose which one. All of these possibilities are 
equally probable, independent of 0j. Therefore, we take 
all the possible internal scenarios into account and, for in- 
stance, when after using the auxiliary random numbers, 
all three hidden units disagree, we choose at random p a 
and accordingly update, 



f i+ 

J q.r 



i 



+o(- 



Pa)d(Pa 



)( 2^9+1,'*- 



(36) 



2/9-I, r+l)' 



The first term corresponds to the case where all three 
hidden units are updated (with probability \). The sec- 
ond term corresponds to the case where only one hidden 
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Figure 6: The synchronization time (dashed line) and the 
dynamic learning time (solid line) distribution, of analytical 
results for TPMs, with L = 1. Symbols stand for the simula- 
tions results, with N = 10000. 



unit is updated. Eq. ( |36[) is valid only for q and r which 
are not on the boundary. 

In the case of the perceptron when an update occurs, 
both sides perform the update, in opposite directions. In 
the case of the TPMs, two matching units do not always 
perform an update together; in many cases one of the 
parties updates unit i, while t he ot her updates unit j, 
i ^ j, as described in case (c) in II A . In such a case, 
not sufficient, 



Eq. 



is not sufficient, and we should add a description 
of the matrices' update when only one party is updated. 
Let us say the party represented by the matrix rows is 
updated. Then we have 



fi+ _ 1 f i , _ fi 

J q,r 2^ 9+l> r 9 



l,r J 



(37) 



and if the party represented by the matrix columns is 
updated, we have 



J q,r 



1 



_ f 1 J p 

2^g, r+l ' ^Jq.r- 



(38) 



where we limit the description only to q, r which are not 
on the boundary. An example is the case when the in- 
ternal presentation of party A is — h + and that of B 

is K Then party A updates unit 1, Eq. ( pSTj ) with 

i = 1, while party B updates unit 3, Eq ( ^8|) with i = 3. 

In Figure | we present the distribution of time steps 
for synchronization according to simulations with N = 
10, 000, (*), and according to the analytical results (solid 
line) in the case of L = 1, taken from 500 different runs. 
The evolution of the average overlap in this case is given 
in Figure ||. A solid line represents the analytical results 
and circles stand for simulation results. Both standard 
deviations are indicated by the error bars. There is good 
agreement between the analytical and simulation results. 
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t synch 


tnaive 


tflipp 


L = 1 


25 ± 14 


36 ± 18 


32 ± 19 


L = 1 


79 ±38 


239 ± 145 


108 ± 58 


L = 3 


166 ± 67 


3320 ± 3039 


221 ± 106 


L = 4 


298 ± 113 


176810 ± 179,446 


380 ± 159 



Table I: Average synchronization and dynamic learning times, 
for the naive attacker and the flipping attacker, for different 
values of L. 



An attacker does not have to achieve full synchroniza- 
tion in order to decipher the secret code. For finite N, 
even a state close enough to synchronization is sufficient 
to break the code, thus making the system insecure. 
Moreover, the analysis and the simulations are faster 
when the aim is to arrive at a partial overlap state. We 
therefore considered an attacker who achieves (p) = 0.9, 
a successful attacker, and synchronization and learning 
times given in Figure [?] and in Table Q are for achieving 
(p) = 0.9. 



C. The Naive Attacker 

The aim of an attacker is to synchronize with one of 
the parties and reveal the secret key (the weights of the 
parties), hence her natural strategy is to imitate one of 
them, party A for instance, by using the same learning 
rule. The attacker, eavesdropping on the public channel 
connecting the parties, knows the input vector x« and the 
output O a/b . When O a ^ s , the parties update their 
weights, and so does the attacker. In the case where 
the attacker's internal presentation is the same as A's, 
they update the same units, an attractive step occurs, 
and the attacker gets closer to her goal. Yet when the 
internal presentations of the attacker and the party differ, 
she updates some wrong units, a repulsive step occurs, 
and this delays her. The 2 if_1 -fold degeneracy in the 
output is the main reason for the attacker's failure. The 
dependence of the attacker's weight vector in a new step 
on the weight vector in the former one is given by 



w?+ = wf 



D(w?x i (r a )x i <T Ji 6{-cT A cT 



AR\ 



(39) 



The analysis is similar to the synchronization process, 
given by Eq. (|36|). We now create 9 matrices, each rep- 
resenting the state of two matching hidden units among 
two parties, and the attacker and each party. We must 
set the parties' internal presentation, as well as the at- 
tacker's. We decide which one of the 8x8x8 internal 
presentations occurs in each step, following the correla- 
tion between the parties and the attack er, a nd update 



the matrices accordingly, as described in [VB . 

Although the attacker may synchronize before the par- 
ties, the average learning time is around twice the syn- 
chronization time for L = 1, and is around 200 times the 
synchronization time for L = 3. It seems that the reason 



for the naive attacker's weakness is that too many repul- 
sive steps occur; therefore, when trying to improve her 
abilities, we need to increase the probability for an at- 
tractive step, and decrease the probability for a repulsive 
one. It has been shown |24| that a small absolute local- 
field value indicates a high probability for an error. In 
the next section we present an advanced attacker which 
makes use of this knowledge. 



D. The Flipping Attacker 

The flipping attacker's strategy, recently introduced in 
H, adds a different move to the strategy of the naive 
attacker when disagreement occurs between the outputs 
of the attacker and party A. In this case, the attacker is 
certain that either one or three of her hidden units are 
in disagreement with A's units, and therefore a repul- 
sive step will occur. Since disagreement of three units is 
less likely than disagreement of one unit, the attacker's 
strategy treats all cases as a one unit disagreement. The 
flipping attacker tries to prevent the repulsive step by 
using a "flipping" approach; she negates the sign of one 
of her units, before performing the update. If the correct 
unit was chosen, then the "new" internal presentation 
matches that of the party, and the same units will be 
updated by both, thus performing an attractive step. To 
raise her chances of flipping the right unit, the attacker 
chooses the one whose absolute local-field value is the 
lowest of the three : f} = — Tj for i that minimizes \hi\. 

The learning rules are the same as those given by Eq. 
(|l2l) for the mutual synchronization, but the attacker's 
learning is different, 



(40) 



a c a A )Q{^r\ 



CjC\ 



a c a A )e(a A T, 



A~C\ 



where fj = — r« if \hi\ < \hj\,\fj ^ i and r, = Tj otherwise. 

The analysis used here is the same as for the naive 
attacker. Here too, we follow the development of 9 ma- 
tri ces w hich are updated at every time step, as described 
in VB. However, in cases where the attacker's output 



disagrees with the A's output, we compute the probabil- 
ity for every unit to be the one with the lowest absolute 
local field value. For instance, when hf > 0,Vi , the 
probability for hi being the smallest is given by: 



poo 

/ P(h?\h A ,hf,{p,Q})dh? 
Jo 



P(hC\h A ,hi,{P,Q})dh% 



(41) 



P(hC\h A ,hi,{p,Q})dhC 
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0.01 - jji 



# of steps 



Figure 7: The synchronization time and learning time dis- 
tribution for the flipping attacker, obtained by simulations 
with N — 10 3 (diamonds/stars for synchronization/learning) 
and analytical calculations (squares/circles for synchroniza- 
tion/learning ) with L = 3, averaged over 10 4 runs. 



Figure 8: The distribution of the ratio R = ti earn /t aync h, 
obtained by simulations (dashed line) with N = 10 3 , and 
analytical (solid line) results, with L — 3, averaged over 10 4 
runs. . 



where the conditional probabilities are given by Eq. (19). 

The generalization to other cases in which hf is not 
necessarily positive, is straightforward. We choose at 
random two specific local fields for the two parties hf 
and hf , from their joint probability distribution which 
is derived from the correlation matrix, making use of the 
overlap between the parties' units. We then proceed to 
calculate the probability of each unit of the attacker to be 
the one with the lowest absolute local field value, given 
by Eq. Q). Once we have P it i = 1,2,3 ( P* is the 
probability that unit i has the lowest local field value), 
we use an auxiliary random number p a , to choose the 
unit to be flipped: 



n = n 



1 - 26 



Pa 




(42) 



where Pq = 0. 

Simulations and analytical calculations with L = 3, 
N = 10 3 averaged over 10 4 runs, indicate that the flip- 
ping attacker is successful. In figure [?] we plotted the 
synchronization time and learning time distribution for 
the flipping attack, obtained by simulations (circles for 
synchronization and squares for learning) and analyti- 
cal calculations (squares for synchronization and trian- 
gles for learning). The flipping attacker's ability can 
be measured by the ratio of the attacker learning time 
and the parties' synchronization time, R = ti ea rn/t S ynch- 
Figure || shows the distribution of this ratio for simu- 
lations (dashed line) and analytical (solid line) results. 
The probability of the flipping attacker to finish learning 
before synchronization is achieved by the parties is 28%, 
as presented in Figure @. 



E. Discussion 

In the previous section we introduced macro-dynamical 
updating equations that imitate the simulation results of 
discrete mutual and dynamic learning. All numeric runs 
of the macro-dynamical equations are in good agreement 
with simulations. The TPMs that perform mutual learn- 
ing synchronize in a finite number of steps that scales 
with In N. The macro-dynamical updating equations de- 
scribe the system in the limit of N — ► oo, and they result 
in an exponential decay of the order parameter p to — 1 , 
where receiving the exact value of —1 depends on com- 
puter accuracy. However, defining the synchronization 
by any finite and close to —1 value, results in a synchro- 
nization state that is achieved in a finite number of steps 
even in the thermodynamic limit. The good fit in that 
limit between analytical results and simulations results 
is indicated in Figures [| [?] and [|. We presented here 
analytical results in the case of continuous as well as dis- 
crete weight vectors. Recently, [0 the scaling between 
N and L was discussed, based on large scale simulations 
with different L and N values. It may be interesting to 
develop the numerical equations in the limit of infinite L 
and to find the appropriate interplay between these two 
quantities. 

We conclude by presenting the potential of the TPMs 
to serve as a public key cryptosystem. This is based 
upon the following features: the synchronization state 
may serve as the key in a certain encryption and decryp- 
tion rule. This key evolves in public without the need 
of prior communication; one needs only to perform a fi- 
nite number of steps of exchanging inputs and outputs 
in order to converge to a synchronized state. The ana- 
lytical derivation shows that even for infinite large sys- 



14 



tems, N — > oo, there will be finite distribution of synchro- 
nization times (where synchronization time is defined by 
p = — l + e where small e is a coefficient) and the synchro- 
nization time itself will be finite. The flipping attacker 
succeeds in revealing the secret for small L values, as L 
enlarges the task becomes harder for her |pl| . It is yet 
to be determined whether it is possible to make better 
use of the information in the channel, and to device a 
strategy that performs perfect learning on the average 
in the same number of steps typical for synchronization 



even for large L. 
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